01. Assessment

In Reinforcement Learning, what principle do Monte Carlo methods use to update the state value function V(s)?

SOLUTION: Play out an entire episode, then update V(s) for each state s encountered using returns from the remainder of the episode.

When training a Deep Q Network, how would you remove the adverse effects of correlation between consecutive experience tuples?

SOLUTION: Store the experience tuples in a buffer, and then randomly sample a batch for every training iteration.

Policy Gradient methods in Deep Reinforcement Learning are better compared to value function approximation methods because:

SOLUTION: They can be used directly with continuous action spaces.